My Datascience Journey
  • Home
  • Projects
  • Notes
  • Study
  • blogs
  • Python Package

  • Anomaly Detection
    • Anomaly Based IDS using ML
    • Anomaly Detection using online Event logs
    • Anomaly detection in dynamic graphs using MIDAS-R
    • Anomaly Detection using Unsupervised methods
  • 3D Deep Learning
    • 3D Data Formats
    • 3D coordination systems
    • 3D Rendering
    • Fitting Deformable Mesh Models to Raw Point Clouds
    • Differentiable Rendering
    • Neural Radiance Fields (NeRF)
    • Resources
  • ML Algorithms
    • Boosting
    • LogitBoost
    • Gradient Boosting
    • LightGBM
    • XGBoost
    • Catboost
  • Data Architecture
    • Big Data Architectures
    • Data Quality for ML
    • Feature Store
    • MLOPS
    • Model Deployment
    • Model Monitoring
  • Transformers
    • Computer Vision Using Transformers
    • Attention is all you need
    • Attention
    • Transformer
    • BERT
    • Transformers from Scratch
    • An Image is worth 16X16 Words
    • Vision Transformers (ViT)
    • How to Train Your ViT
    • Resources
  • Interpretable_ml
    • Introduction
    • Linear regression
    • Logistic Regression
    • Explainable Boosting Machines
    • Generalized Linear Models (GLM)
    • Decision Trees
    • Rulefit
    • Naive Bayes
    • Global Model Agnostic Methods
    • Local Model-Agnostic Methods
    • CNN Interpretation
    • Neural GAMs
    • Resources
    • Editable Interpretable Models
  • Graph Machine Learning
    • Graph Machine Learning
    • Resources
  • Industry Usecases
    • AI Use cases for the Insurance Industry
  • Bayesian Analysis
    • Bayesian Analysis
    • Resources
  • Causal Inference
    • Intro
    • Randomised Experiments
    • Stats Revisited
    • Graphical Causal Models
    • Packages
  • Computer Vision
    • Architecture for Image Classification
    • CNN Architectures
    • Object Detection
    • <<<<<<< HEAD Image Classification and Localization
    • ======= >>>>>>> refs/remotes/origin/main You Only Look Once (YOLO)
    • Images Classification Implementation
    • Image Segmentation
    • Image Segmentation
    • Architecutures for Image segmentation
    • OneFormer: One Transformer to rule Universal Image Segmentation
  • NLP
    • Text Preprocessing
    • Information Extraction
    • RNN & LSTM
    • Starspace
    • Transformer Family of Models
    • Text Summarization
    • GPT
    • BERT
    • Chatbots
    • Question Answering (QA)
    • Algorithms for Chatbot
    • InstructGPT
    • Making Transformers efficient in Production
    • Instruction Finetuned Text Embeddings
  • Data Science Project Lifecycle
    • Sampling
    • Training
    • Feature Engineering
    • ML Algorithms
    • Gradient Descent
    • Regularization
    • Model Development
    • Why ML system fails
    • MlOps
    • Resources
  • Math for AI
    • Introduction
    • Distributions
    • Fitting functions to data
    • Gradient Descent, Activations and Regularisation
  • Time Series
    • Time Series Introduction
    • Exploratory Analysis
    • Simulating Time Series Data
    • Feature Engineering for Time Series
    • Feature Engineering for Time Series
    • ML for Time Series
    • packages for Time series
  • Geograhic Data Processing
    • Geographic Data
    • Visualizing Buildings in a location along with its Area
    • Spatial Analysis using Geopandas
    • Coordinate Reference Systems (CRS)
    • Data Visualization using Folium
    • OpenStreetMap
    • Converting Data from Raster to Tabular (Geometry) format
  • Machine learning Implementations
    • EDA on Telecom Churn Data
    • Telecom Churn Prediction
  • Data Quality
    • Ensuring Data Quality
    • Create a new Datasource
    • Initialize a new Expectation Suite by profiling a batch of your data.
    • Create Checkpoint
  • Data Privacy
    • Approaches to Data privacy
    • Differential Privacy
  • Distributed Processing
    • Fugue
    • Fugue Quickstart
    • FugueSQL
  • <<<<<<< HEAD Pytorch ======= DSA >>>>>>> refs/remotes/origin/main
    • <<<<<<< HEAD Introduction to PyTorch ======= Insertion Sort >>>>>>> refs/remotes/origin/main
    • <<<<<<< HEAD Simple Neural Network in Pytorch
  • DSA
    • Insertion Sort
    • ======= >>>>>>> refs/remotes/origin/main Selection Sort
    • Bubble Sort
    • Merge Sort
    • Quick Sort
    • Binary Search
    • Binary Search Tree
    • Find Closest Value in BST
  • <<<<<<< HEAD System Design
      ======= System Design
    >>>>>>> refs/remotes/origin/main
  • Step by Step Guide for System Design
  • Scaling web services to millions of users
  • <<<<<<< HEAD probability
      ======= probability
    • >>>>>>> refs/remotes/origin/main
    • Probability
  • <<<<<<< HEAD Why Me
      ======= Why Me
        >>>>>>> refs/remotes/origin/main
      • Why me

    On this page

    • EDA on Telecom Churn Data
      • Import the required libraries
      • Check the Shape and Column types of the Dataframe
      • Exploratory Analysis

    EDA on Telecom Churn Data

    The objectives of this project are:-
    1. Perform exploratory analysis and extract insights from the dataset.
    2. Split the dataset into train/test sets and explain your reasoning.
    3. Build a predictive model to predict which customers are going to churn and discuss the reason why you choose a particular algorithm.
    4. Establish metrics to evaluate model performance.
    5. Discuss the potential issues with deploying the model into production

    Import the required libraries

    # python version # 3.8.2
    import pandas as pd 
    import numpy as np 
    import os 
    from pandas_profiling import ProfileReport
    import warnings
    warnings.filterwarnings('ignore')
    # option to display all columns
    pd.set_option('display.max_columns', None)
    # Read the data
    telecom_churn = pd.read_csv('../data/telecom_data/telecom.csv')
    telecom_churn.head(10)
    state account length area code phone number international plan voice mail plan number vmail messages total day minutes total day calls total day charge total eve minutes total eve calls total eve charge total night minutes total night calls total night charge total intl minutes total intl calls total intl charge customer service calls churn
    0 KS 128 415 382-4657 no yes 25 265.1 110 45.07 197.4 99 16.78 244.7 91 11.01 10.0 3 2.70 1 False
    1 OH 107 415 371-7191 no yes 26 161.6 123 27.47 195.5 103 16.62 254.4 103 11.45 13.7 3 3.70 1 False
    2 NJ 137 415 358-1921 no no 0 243.4 114 41.38 121.2 110 10.30 162.6 104 7.32 12.2 5 3.29 0 False
    3 OH 84 408 375-9999 yes no 0 299.4 71 50.90 61.9 88 5.26 196.9 89 8.86 6.6 7 1.78 2 False
    4 OK 75 415 330-6626 yes no 0 166.7 113 28.34 148.3 122 12.61 186.9 121 8.41 10.1 3 2.73 3 False
    5 AL 118 510 391-8027 yes no 0 223.4 98 37.98 220.6 101 18.75 203.9 118 9.18 6.3 6 1.70 0 False
    6 MA 121 510 355-9993 no yes 24 218.2 88 37.09 348.5 108 29.62 212.6 118 9.57 7.5 7 2.03 3 False
    7 MO 147 415 329-9001 yes no 0 157.0 79 26.69 103.1 94 8.76 211.8 96 9.53 7.1 6 1.92 0 False
    8 LA 117 408 335-4719 no no 0 184.5 97 31.37 351.6 80 29.89 215.8 90 9.71 8.7 4 2.35 1 False
    9 WV 141 415 330-8173 yes yes 37 258.6 84 43.96 222.0 111 18.87 326.4 97 14.69 11.2 5 3.02 0 False

    Check the Shape and Column types of the Dataframe

    telecom_churn.shape
    (3333, 21)
    telecom_churn.dtypes
    state                      object
    account length              int64
    area code                   int64
    phone number               object
    international plan         object
    voice mail plan            object
    number vmail messages       int64
    total day minutes         float64
    total day calls             int64
    total day charge          float64
    total eve minutes         float64
    total eve calls             int64
    total eve charge          float64
    total night minutes       float64
    total night calls           int64
    total night charge        float64
    total intl minutes        float64
    total intl calls            int64
    total intl charge         float64
    customer service calls      int64
    churn                        bool
    dtype: object

    Exploratory Analysis

    # Format the column names, remove space and special characters in column names
    telecom_churn.columns =  telecom_churn.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
    telecom_churn
    state account_length area_code phone_number international_plan voice_mail_plan number_vmail_messages total_day_minutes total_day_calls total_day_charge total_eve_minutes total_eve_calls total_eve_charge total_night_minutes total_night_calls total_night_charge total_intl_minutes total_intl_calls total_intl_charge customer_service_calls churn
    0 KS 128 415 382-4657 no yes 25 265.1 110 45.07 197.4 99 16.78 244.7 91 11.01 10.0 3 2.70 1 False
    1 OH 107 415 371-7191 no yes 26 161.6 123 27.47 195.5 103 16.62 254.4 103 11.45 13.7 3 3.70 1 False
    2 NJ 137 415 358-1921 no no 0 243.4 114 41.38 121.2 110 10.30 162.6 104 7.32 12.2 5 3.29 0 False
    3 OH 84 408 375-9999 yes no 0 299.4 71 50.90 61.9 88 5.26 196.9 89 8.86 6.6 7 1.78 2 False
    4 OK 75 415 330-6626 yes no 0 166.7 113 28.34 148.3 122 12.61 186.9 121 8.41 10.1 3 2.73 3 False
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    3328 AZ 192 415 414-4276 no yes 36 156.2 77 26.55 215.5 126 18.32 279.1 83 12.56 9.9 6 2.67 2 False
    3329 WV 68 415 370-3271 no no 0 231.1 57 39.29 153.4 55 13.04 191.3 123 8.61 9.6 4 2.59 3 False
    3330 RI 28 510 328-8230 no no 0 180.8 109 30.74 288.8 58 24.55 191.9 91 8.64 14.1 6 3.81 2 False
    3331 CT 184 510 364-6381 yes no 0 213.8 105 36.35 159.6 84 13.57 139.2 137 6.26 5.0 10 1.35 2 False
    3332 TN 74 415 400-4344 no yes 25 234.4 113 39.85 265.9 82 22.60 241.4 77 10.86 13.7 4 3.70 0 False

    3333 rows × 21 columns

    profile = ProfileReport(telecom_churn, title = "Telecom Churn Report")
    profile.to_notebook_iframe()
    email: tulasiram.gunipati@gmail.com